Objective

To deliver a healthy variety of farm fresh fruits, vegetables, and grains to as many elementary school students as possible in California using Zipline drones based at two optimally located distribution centers.

Data

This study draws on a combination of spatial datasets covering state crop production, average crop yields, statistics on primary schools, and county boundaries:

Methodology

Finding the optimal location of a distribution center requires a preliminary spatial analysis of the number of students served by potential service areas and crop production per county.

Given a delivery limitation of 80km, we can draw a circle with a 80km radius around each potential location, forming a service area of 20106 km2. But which service areas would include the most students within their boundaries? We propose to treat each school location as a potential co-located distribution center. A 80km buffer was created around each primary school location in the state (n = 10043), which was then overlayed with school locations for spatial aggreagtion. Research began with a point-in-polygon analysis, summing the total enrolled students (EnrollTota) per service area/buffer.

#create buffers

counties = st_as_sf(counties)
schools = st_as_sf(schools)
schools_buffer = st_buffer(schools, 80000)

#convert from class sf to st

schools_spdf <- as_Spatial(schools)
schools_buffer_spdf <- as_Spatial(schools_buffer)
counties = as_Spatial(counties)

#transform crs to WGS84
sb_wgs84 = spTransform(schools_buffer_spdf, CRS("+proj=longlat +datum=WGS84"))
s_wgs84 = spTransform(schools_spdf, CRS("+proj=longlat +datum=WGS84"))
c_wgs84 = spTransform(counties, CRS("+proj=longlat +datum=WGS84"))

#loop through each school and save output to students_by_buffer.csv)

#students_by_buffer <-data.frame(scode=numeric(0)
#                                ,school_name=character(0)
#                                ,county_name=character(0)
#                                ,quantity=integer(0),stringsAsFactors=FALSE)

#for(i in 1:nrow(schools_buffer)) {
  
#  scode = schools_buffer[i,]$SCode
#  sname = as.character(schools_buffer[i,]$SchoolName)
#  cname = as.character(schools_buffer[i,]$CountyName)
#  x = st_intersection(schools_buffer[i,], schools)
#  sum_students = sum(x$EnrollTota.1)
#  students_by_buffer[i,] = c(scode, sname, cname, sum_students)
#  print(i)

#}

students_by_buffer = fread("students_by_buffer.csv")
students_by_buffer$scode = as.character(students_by_buffer$scode)
students_by_buffer$scode = str_pad(students_by_buffer$scode, 7, pad = "0")

dist_c = students_by_buffer  %>% 
                    group_by(county_name)  %>%
                    slice(which.max(quantity))  %>%
                    arrange(desc(quantity))

optimal_buffers = sb_wgs84[as.character(sb_wgs84$SCode) %in% dist_c$scode,]  
optimal_points = s_wgs84[as.character(s_wgs84$SCode) %in% dist_c$scode,]  
optimal_buffers$SCode = as.character(optimal_buffers$SCode)

#check=optimal_points@data

merged = merge(optimal_buffers@data, dist_c, by.x = 'SCode', by.y = 'scode')
merged <- merged[order(merged$CountyName),]
optimal_buffers$quantity = merged$quantity

We can narrow down one optimal location per county, optimal defined as servicing the highest number of students

Optimal locations per county

Immediately, we see that there are many overlapping service areas, that the most populous ones are in the south of the state, and that California can be roughly divided into two halves: Northern (NoCal) and Southern California (SoCal).

In order to ensure a geographically balanced distribution of service, we have decided to locate one distribution center in NoCal and one in SoCal (the artificial border drawn at Fresno).

Note the CountyName and/or geometry provided in the Ca-Schools dataset may have errors as multiple points appear in several counties and several others do not contain points

Food production

Due to regulations, distribution centers may receive produce sourced from the county in which they are located. Now that we have optimal locations per county, we are faced with the question of determining counties that can supply a diverse amount of fruits and vegetables. For this study, we will focus on fruits, vegetabes, and grains and will remove other types of crops from the dataset (i.e. cotton, managed wetland, grasses, deciduous shrubs). Since truck crops are earmarked for distant markets, not for regional consumption, they will also be removed. Idle crops were removed as well as vague categories such as Urban and Greenhouse.

Crop yields per acre are also required in order to ensure an adequate food amount per student. For the purpoe of this study, crop yields will be assumed to be consistent with current and future yields (note that categories between the ASR and crop shapefile do not always match, so some assumptions are made such as citrus = oranges, suptropical fruit = tangerines, bush berries = blueberries and mixed pasture is imputed as the average of all crops). It is assumed that the combined acerage of crops in any given county will appoximate the ARS yield rate.

Nutrition

According to the National Institue of Health, different foods and food groups are good sources for various macro-and micronutrients, so a diverse diet best ensures nutrient adequacy. This is even more important for the healthy growth and development of children. Zipline prioritizes nutrition and offering the hightest diversity of options to growing youngesters. With that in mind, it will be recommended to service less students with more nutritious options than more students with less nutritious options. Each county will receive a food diversity score, which is a percentage of total crop categories that are represented in that county.

crops <- readOGR(dsn= '/Users/alexandertedeschi/Downloads/FTS/CA-Crops/CropMapping_2014.shp', 
                 layer='CropMapping_2014', verbose = FALSE)

cropdata = crops@data
cropagg = cropdata %>% group_by(Crop2014, County) %>% summarise(acreage = sum(Acres))


inedible = c('Managed Wetland', 'Young Perennials', 'Idle', 'Miscellaneous Deciduous',
             'Miscellaneous Truck Crops', 'Flowers, Nursery and Christmas Tree Farms', 'Urban',
             'Greenhouse', 'Cotton', 'Miscellaneous Grasses', 'Safflower', 'Miscellaneous Field Crops')


cropagg = cropagg[!cropagg$Crop2014 %in% inedible,]
cropyield = read.csv("cropyield.csv")

cropagg = merge(cropagg, cropyield[, c("crop", 'ton.acre')], by.x ='Crop2014', by.y ='crop')

schooldata = cbind(optimal_buffers@data, optimal_buffers$Latitude)
schooldata$region = ifelse(schooldata$Latitude > 36.7378, 'NoCal', 'SoCal')

cropagg = merge(cropagg, schooldata[, c('CountyName', 'quantity', 'region')], by.x ='County', by.y ='CountyName')
cropagg$pps = round(((cropagg$ton.acre * 2000) * cropagg$acreage) / cropagg$quantity , 2)


nocal = schooldata[schooldata$region == 'NoCal',]
nocal = head(nocal[order(-nocal$quantity),], 20)

p<-ggplot(data=nocal, aes(x = reorder(CountyName, -quantity), y=quantity)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=quantity), y = 150000, color = 'yellow')+
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust=0.95)) +
  coord_flip() + scale_x_discrete(name ="county") + scale_y_continuous(name ="students served") +
  ggtitle('Top 20 county service areas by students served in NoCal')

p

test = cropagg[cropagg$County %in% nocal$CountyName,]


# Heatmap 
ggplot(test, aes(Crop2014, County, fill= pps)) + 
  geom_tile() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 65, hjust=0.95)) + 
  labs(fill = "Crop Yield \n(pounds\n per\n student)") +
  scale_x_discrete(name ="Crop") + 
  scale_y_discrete(name ="County") + 
  scale_fill_distiller(palette = "Blues") +
  ggtitle('Crop yield by top 20 service areas in NoCal')


socal = schooldata[schooldata$region == 'SoCal',]
socal = head(socal[order(-socal$quantity),], 20)

p<-ggplot(data=socal, aes(x = reorder(CountyName, -quantity), y=quantity)) +
  geom_bar(stat="identity", fill="tomato")+
  geom_text(aes(label=quantity), y = 250000, color = 'darkblue')+
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust=0.95)) +
  coord_flip() + scale_x_discrete(name ="county") + scale_y_continuous(name ="students served") +
  ggtitle('Top 20 county service areas by students served in SoCal')
p

test = cropagg[cropagg$County %in% socal$CountyName,]


# Heatmap 
ggplot(test, aes(Crop2014, County, fill= pps)) + 
  geom_tile() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 65, hjust=0.95)) + 
  labs(fill = "Crop Yield \n(pounds\n per\n student)") +
  scale_x_discrete(name ="Crop") + 
  scale_y_discrete(name ="County") + 
  scale_fill_distiller(palette = "Reds") +
  ggtitle('Crop yield by top 20 service areas in SoCal')

fd = cropagg %>% 
  group_by(County, region) %>%
  summarise(food_diversity = round(n_distinct(Crop2014)/35, 2), 
            population_s = mean(quantity),
            mean_pps = round(mean(pps),1)) 

fd_nocal = fd[fd$region == 'NoCal',]
fd_socal = fd[fd$region == 'SoCal',]
  
ggplot(fd_nocal, aes(x = population_s, y = food_diversity)) +
  geom_point(color = "orange", size = 3) + 
   geom_point(aes(x=506640 , y=0.86, size = 3), colour="purple") +
  geom_text(data = fd_nocal, label=fd_nocal$County, position=position_jitter(height  =0.05)) +  
  ggtitle("NoCal: Student population served vs food diversity") + theme(legend.position = "none")

  #theme_grey(base_size = 18) 


ggplot(fd_socal, aes(x = population_s, y = food_diversity)) +
  geom_point(color = "orange", size = 3) + 
  geom_point(aes(x= 1041339  , y=0.89, size = 3), colour="purple") +
  geom_text(data = fd_socal, label=fd_socal$County, position=position_jitter(height  =0.05))  +  
  ggtitle("SoCal: Student population served vs food diversity") + theme(legend.position = "none") 

  #theme_grey(base_size = 18) 


Recommendations

The graphs and heatmaps above demonstrate the tradeoff between serving a larger population of students and offering a diversity of foods. Counties with the highest populations, such as Los Angeles and Contra Costa, do not produce desirable staple foods such as potatoes, grains, wheat, or leafy greens.

The heatmaps above show that in SoCal, Yolo county lacks only 4 food crops out of 35 (89% food diversity score), and that its optimal distribution center would serve over 1 million students. Fresno and Kurn have the same food diversity scores (89%), but serve 8% fewer people. In NoCal, Stanislaus county stands out with only 5 missing crops (86% food diversity score) and a distribution center that would serve over half a million students. The next in line by diversity of food is Merced county (5 missing crops) but its distribution center would service less than half the amount of students than the one proposed in Stanislaus.

Proposed distribution center locations

Our distribution centers would service a total of 1,547,979 enrolled school students, or about 25% of the total population of enrolled school students in California

NoCal

stanislaus = optimal_buffers[optimal_buffers$CountyName == 'Stanislaus',]
stanislaus_dc = optimal_points[optimal_points$CountyName == 'Stanislaus',]
stanislaus = st_as_sf(stanislaus)
schools = st_as_sf(s_wgs84)
nocal_schools = st_intersection(stanislaus, schools)

labs<- c("<b>NoCal Distribution Center</b> <br/>
         <i> Stanislaus County </i>  <br/>
         Co-located with Rising Sun School <br/> </br>
         Total served schools: 795 </br> Tota served students: 506640 </br>
         Food diversity score: 86%")
    

m <- leaflet() %>% setView(lng =  -121.2879, lat = 37.61035, zoom = 8)
m %>% addProviderTiles(providers$CartoDB.Positron) %>% 
   addPolygons(data = c_wgs84,
              group = 'counties',
              opacity = 0.2,
              color = "black",
              fillOpacity = 0,
              weight = 2)  %>% 
  addPolygons(data = stanislaus,
              group = 'service area',
              fillColor = 'green',
              opacity = 1,
              color = "black",
              dashArray = "3",
              fillOpacity = 0.3,
              weight = 1) %>% 
  addCircleMarkers(
    data = nocal_schools,
    group = 'schools',
    radius = 2,
    color = 'blue',
    stroke = FALSE,
    fillOpacity = 0.7
  ) %>%
   addMarkers(
    lng =-121.2879, lat = 37.61035,
    label = lapply(labs, htmltools::HTML),
    labelOptions = labelOptions(noHide = T, textsize = "15px"))  %>%
    addLogo(img= 'ziplinelogo.jpg')

SoCal

yolo = optimal_buffers[optimal_buffers$CountyName == 'Yolo',]
yolo_dc = optimal_points[optimal_points$CountyName == 'Yolo',]
yolo = st_as_sf(yolo)
schools = st_as_sf(s_wgs84)
socal_schools = st_intersection(yolo, schools)

labs<- c("<b>SoCal Distribution Center</b> <br/>
         <i> Yolo County </i>  <br/>
         Co-located with Winters Joint Unified </br>
         Compass Charter School <br/> </br>
         Total served schools: 1665 </br> Tota served students: 1041339 </br>
         Food diversity score: 89%")
    


m <- leaflet() %>% setView(lng =  -118.8305, lat = 34.1552, zoom = 8)
m %>% addProviderTiles(providers$CartoDB.Positron) %>% 
   addPolygons(data = c_wgs84,
              group = 'counties',
              opacity = 0.2,
              color = "black",
              fillOpacity = 0,
              weight = 2)  %>% 
  addPolygons(data = yolo,
              group = 'service area',
              fillColor = 'green',
              opacity = 1,
              color = "black",
              dashArray = "3",
              fillOpacity = 0.3,
              weight = 1) %>% 
  addCircleMarkers(
    data = socal_schools,
    group = 'schools',
    radius = 2,
    color = 'blue',
    stroke = FALSE,
    fillOpacity = 0.7
  ) %>%
   addMarkers(
    lng =-118.8305, lat = 34.1552,
    label = lapply(labs, htmltools::HTML),
    labelOptions = labelOptions(noHide = T, textsize = "15px"))  %>%
    addLogo(img= 'ziplinelogo.jpg')

Limitations

More research on food production in California could be done with a longer turnaround time, which for this study amounted to less than 12 hours. The main assumption on which this study rests is that food producers would be willing to do business with Zipline and the California Department of Education, and to ship their produce to our distribution centers. The above recommendations also do not take into account terrain or weather data, which could impact the viability of the distribution centers and their service areas.